fix(compressor): keep truncated tool_call arguments as valid JSON by handsdiff · Pull Request #1 · handsdiff/hermes-agent

handsdiff · 2026-04-17T13:13:00Z

Summary

context_compressor.py Pass 3 truncates large tool_call arguments strings by byte-slicing to 200 chars and appending the literal sentinel ...[truncated]. The result is invalid JSON — mid-value, unterminated string.

Some providers (notably Anthropic via LiteLLM) re-parse function.arguments when rebuilding tool_use blocks. Once a session is compressed with any tool_call whose original args exceeded 500 chars (routine for patch), the next API call fails with:

HTTP 400: litellm.BadRequestError: AnthropicException -
Failed to parse tool call arguments for tool 'patch' (Anthropic tool invoke).
Error: Unterminated string starting at: line 1 column 35 (char 34).

The 3 retries are identical payloads, so they all fail identically, and every subsequent user turn replays the corrupted history. The session is permanently stuck and the user sees only a canned error fallback.

Fix

Replace the byte slice with a compact JSON sentinel that preserves provenance (original length + 200-char preview) while staying parseable:

{"_truncated": true, "_original_length": 1247, "_preview": "..."}

Observed in the wild

A production session had 11 prior patch tool_calls poisoned by this truncation after compression. Every subsequent user message triggered a new failure cycle — user saw 3 cryptic LiteLLM 400 error strings back-to-back with no way to recover (the poison is in server-side history).

Test plan

New unit test test_pruned_tool_call_args_remain_valid_json that asserts the pruned arguments are parseable via json.loads and carry a sensible sentinel shape.
All existing TestTokenBudgetTailProtection tests still pass.

When _prune_old_tool_results truncates a large tool_call arguments string, it was byte-slicing to 200 chars and appending the literal sentinel "...[truncated]". The resulting string is not valid JSON (unterminated mid-value), and some providers — notably Anthropic via LiteLLM — re-parse function.arguments when rebuilding tool_use blocks. The next API call fails with HTTP 400 ("Unterminated string starting at: line 1 column 35"), and every subsequent turn replays the same corrupted history, so the session is permanently stuck. Observed in the wild: a session that compressed with 11 prior patch tool_calls (each >500 chars) then returned HTTP 400 on every inbound message. The 3 retries are identical payloads, so they all fail the same way. The user sees only a canned error fallback. Replace the byte slice with a compact JSON sentinel that preserves provenance (original length + 200-char preview) while staying parseable. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ts (NousResearch#11745) Move moonshotai/kimi-k2.5 to position #1 in every model picker list: - OPENROUTER_MODELS (with 'recommended' tag) - _PROVIDER_MODELS: nous, kimi-coding, opencode-zen, opencode-go, alibaba, huggingface - _model_flow_kimi() Coding Plan model list in main.py kimi-coding-cn and moonshot lists already had kimi-k2.5 first.

When the live Vercel AI Gateway catalog exposes a Moonshot model with zero input AND output pricing, it's promoted to position #1 as the recommended default — even if the exact ID isn't in the curated AI_GATEWAY_MODELS list. This enables dynamic discovery of new free Moonshot variants without requiring a PR to update curation. Paid Moonshot models are unaffected; falls back to the normal curated recommended tag when no free Moonshot is live.

…#13354) Classic-CLI /steer typed during an active agent run was queued through self._pending_input alongside ordinary user input. process_loop, which drains that queue, is blocked inside self.chat() for the entire run, so the queued command was not pulled until AFTER _agent_running had flipped back to False — at which point process_command() took the idle fallback ("No agent running; queued as next turn") and delivered the steer as an ordinary next-turn user message. From Utku's bug report on PR NousResearch#13205: mid-run /steer arrived minutes later at the end of the turn as a /queue-style message, completely defeating its purpose. Fix: add _should_handle_steer_command_inline() gating — when _agent_running is True and the user typed /steer, dispatch process_command(text) directly from the prompt_toolkit Enter handler on the UI thread instead of queueing. This mirrors the existing _should_handle_model_command_inline() pattern for /model and is safe because agent.steer() is thread-safe (uses _pending_steer_lock, no prompt_toolkit state mutation, instant return). No changes to the idle-path behavior: /steer typed with no active agent still takes the normal queue-and-drain route so the fallback "No agent running; queued as next turn" message is preserved. Validation: - 7 new unit tests in tests/cli/test_cli_steer_busy_path.py covering the detector, dispatch path, and idle-path control behavior. - All 21 existing tests in tests/run_agent/test_steer.py still pass. - Live PTY end-to-end test with real agent + real openrouter model: 22:36:22 API call #1 (model requested execute_code) 22:36:26 ENTER FIRED: agent_running=True, text='/steer ...' 22:36:26 INLINE STEER DISPATCH fired 22:36:43 agent.log: 'Delivered /steer to agent after tool batch' 22:36:44 API call NousResearch#2 included the steer; response contained marker Same test on the tip of main without this fix shows the steer landing as a new user turn ~20s after the run ended.

Previously the breaker was only cleared when the post-reconnect retry call itself succeeded (via _reset_server_error at the end of the try block). If OAuth recovery succeeded but the retry call happened to fail for a different reason, control fell through to the needs_reauth path which called _bump_server_error — adding to an already-tripped count instead of the fresh count the reconnect justified. With fix #1 in place this would still self-heal on the next cooldown, but we should not pay a 60s stall when we already have positive evidence the server is viable. Move _reset_server_error(server_name) up to immediately after the reconnect-and-ready-wait block, before the retry_call. The subsequent retry still goes through _bump_server_error on failure, so a genuinely broken server re-trips the breaker as normal — but the retry starts from a clean count (1 after a failure), not a stale one.

handsdiff merged commit 01ba396 into main Apr 17, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compressor): keep truncated tool_call arguments as valid JSON#1

fix(compressor): keep truncated tool_call arguments as valid JSON#1
handsdiff merged 1 commit intomainfrom
fix/compressor-tool-args-valid-json

handsdiff commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

handsdiff commented Apr 17, 2026

Summary

Fix

Observed in the wild

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant